Improved Forward Masking on a Generalized Logarithmic Scale for Robust Speech Recognition
نویسندگان
چکیده
We previously proposed a forward masking on a generalized logarithmic scale to eliminate convolutional noise as well as to suppress additive noise. While the generalized Dynamic Cepstrum derived from the masked spectrum has been robust to both noises, the robustness to convolutional noise slightly degrades as compared to masking on the logarithmic scale, and the optimal masking coefficient depends on SNR and the type of noises. This paper improves these issures by applying the variance normalization and by controlling the masking the level depending on the estimated SNR. The recognition tests using the Aurora2 database shows that the variance normalization improves the word accuracy even for the test set C, which includes MIRS distortion, from 79.2% for the logarithmic scale to 84.0% for the generaized logirithimic scale, and that the two level masking depending on SNR improves the word acculacy for speech babble noise of 10dB from 83.7% to 89.9%.
منابع مشابه
Forward masking on a generalized logarithmic scale for robust speech recognition
This paper examines the forward masking on the generalized logarithmic scale for robust speech recognition to both additive and convolutional noise. The forward masking in the dynamic cepstral (DyC) representation is based upon subtraction of a masking pattern from a current spectrum on a logarithmic spectral domain, whereas the proposed method intends to make a compromise between the logarithm...
متن کاملEvaluation of a generalized dynamic cepstrum in distant speech recognition
This paper examines the effectiveness of a generalized dynamic cepstrum in distant speech recognition. The generalized dynamic cepstrum (DyMFGC) is based upon the forward masking on the generalized logarithmic spectrum instead of the log-spectrum, which intends to make it robust to additive noise as well as convolutional noise. Digit recognition tests were carried out in a relatively quiet and ...
متن کاملA model of dynamic auditory perception and its application to robust word recognition
This paper describes two mechanisms that augment the common automatic speech recognition (ASR) front end and provide adaptation and isolation of local spectral peaks. A dynamic model consisting of a linear filterbank with a novel additive logarithmic adaptation stage after each filter output is proposed. An extensive series of perceptual forward masking experiments, together with previously rep...
متن کاملAn auditory feature extraction method based on forward-masking and its application in robust speaker identification and speech recognition
1 This work is supported by National Nature Science Funds of China, the project number i Abstract: This article presents a new auditory feature extraction method, which considers the forwardmasking mechanism of auditory nerves and feasible in practice. Two features based on this method are extracted: FMFRC (forward masking firing-rate cepstrum) and FMSRC (forward masking synchronized rate cepst...
متن کاملGeneralized-Log Spectral Mean Normalization for Speech Recognition
Most compensation methods for robust speech recognition against noise assume independency between speech, additive and convolutive noise. However, the nonlinear nature distortion caused by noise may introduce correlation between noise and speech. To tackle this issue, we propose generalized-log spectral mean normalization (GLSMN) in which log spectral mean normalization (LSMN) is carried out in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004